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Abstract 

Since the seminal work from F. Cohen in the eight- 
ies, abstract virology has seen the apparition of suc- 
cessive viral models, all based on Turing-equivalent 
formalisms. But considering recent malware such as 
rootkits or k-ary codes, these viral models only par- 
tially cover these evolved threats. The problem is that 
Turing-equivalent models do not support interactive 
computations. New models have thus appeared, offer- 
ing support for these evolved malware, but loosing 
the unified approach in the way. This article provides 
a basis for a unified malware model founded on 
process algebras and in particular the Join-Calculus. 
In terms of expressiveness, the new model supports 
the fundamental definitions based on self-replication 
and adds support for interactions, concurrency and 
non-termination allows the definition of more complex 
behaviors. Evolved malware such as rootkits can now 
be thoroughly modeled. In terms of detection and pre- 
vention, the fundamental results of undecidability and 
isolation still hold. However the process-based model 
has permitted to establish new results: identification 
of fragments from the Join-Calculus where malware 
detection becomes decidable, formal definition of the 
non-infection property, approximate solutions to re- 
strict malware propagation. 

Index Terms 

Malware theoretical models - Malware detection 
and prevention - Process Algebra - Information fiow. 

1. Introduction 

Looking at recent publications, process calculi such 
as the TT-calculus are widespread in the modeling of 



biological systems either molecular-based or cellular- 
based [1], [2]. Computer virology is a domain where 
numerous parallels can be drawn between infectious 
diseases and malicious codes, commonly called mal- 
ware. A question can be naturally raised: are process 
calculi also adapted to computer virology? 

1.1. Related works and contribution 

Considering malware, a recent article underUnes the 
fact that interactions with the execution environment, 
concurrency and also non-termination prove to be im- 
portant computation functionalities [3]. In effect, mal- 
ware, being resilient and adaptive by nature, intensively 
use these functionalities to survive and infect new 
systems. Looking at the theoretical models existing 
in abstract virology, they mainly focus on the self- 
replication capacity which is defined in a purely func- 
tional way [4], [5, Chpt.2-3],[6]. Unfortunately, these 
models rely on Turing-equivalent formaUsms which 
can hardly support interactive computations. With the 
apparition of interaction-based viral techniques, new 
models have thus been introduced to cope with this 
drawback, but loosing the unified approach in the 
way. The apparition of k-ary malware is an obvious 
example. In effect, these malware heavily rely on 
concurrency by a distribution of the malicious code 
over several executing parts. A new model based on 
Boolean functions has been provided to model their 
evolving interdependence over time [7]. A second 
relevant example is the apparition of reactive non- 
terminating techniques such as stealth currently de- 
ployed in rootkits. Different models have been pro- 
vided to cover stealth based either on steganography 
[8] or graph theory [9]. 

According to [3], by evolving towards interaction- 
dedicated formalisms such as process calculi, a unified, 
reference model for malware could be defined to sup- 



port these innovative techniques. Generally speaking, 
process calculi model the computer notion of process, 
that is to say an executing entity, mobile and communi- 
cating inside a context [10]. This perspective is closer 
to our nowadays vision of computer systems. The 
problem is now to choose the most adapted process 
calculus between the different existing ones. In order 
to keep the expressiveness of former models based 
on self-rephcation, the chosen process calculus must 
support both functional and interactive aspects. After 
study, the Join-Calculus was found to be the most 
adequate for building a malware model [11], [12]. 

As previously said, moving towards process calculi 
makes the malware model closer to reality while offer- 
ing a greater expressiveness. However, the model still 
provides reasoning and proof facilities since it relies 
on an estabhshed theoretical formalism. But this is not 
the only benefit. The interactive aspects increase the 
visibility of computations and information flows. As 
a consequence, the identification of potential detection 
methods and the localization of possible control points 
become proportionally easier. The contribution of this 
article can be summed-up to the following points: 

• Elaboration of a new viral model based on the 
Join-Calculus. Starting from the self-replication 
mechanism from functional models, this new 
model subsequently extends their expressiveness 
to support interactions, concurrency and infinite 
reactive computations. 

• Extension of the viral model to generic malware 
through a parametrization of the key components: 
the replication mechanism, the research of the 
replication target and the payload. 

• Study of the impact of the formalism migration 
on the fundamental results concerning detection 
and prevention. 

The article is articulated as follows. A first short in- 
troduction of the Join-Calculus is given to end this in- 
troduction. Section 2 briefly summarizes the functional 
notion of self-replication inside former viral models. 
Section 3 introduces the new process-based model 
which allows the definition of a distributed, context- 
dependent version of the self-replication. Section 4 
extends the model to generic malware with an example 
of model parametrization to support companion viruses 
and rootkits. Once the model established. Section 5 
addresses the existence of an algorithm either to detect 
malware relatively to a system context or to assess the 
resistance of system contexts relatively to a given class 
of malware. At last. Section 6 focuses on proactive 
solutions with the purpose of malware prevention. 



1.2. Introducing the Join-Calculus 

This minimal introduction is only given to guar- 
antee the self-containment of the article. Any reader 
interested in a thorough introduction is referred to 
the relative literature [11], [12]. At the basis of the 
Join-Calculus, an infinite set N of names x,y,z... is 
defined. Names can be compound into vectors using 
the notation "x equivalent to xo,xi, ...,Xn- Names 
constitute the basic blocks for message emissions of 
the form x<v> where x is called the channel and 
V the transmitted message. Given in the Figure 1, the 
syntax of the Join-Calculus defines three different ele- 
ments to handle message passing: processes (P) being 
the communicating entities, definitions (D) describing 
the system evolution resulting of the interprocess- 
communication, and the join-patterns (J) describing 
the channels and messages involved in the communi- 
cation [11, pp.57-60]. 

For ease of modeling, the syntactic facilities offered 
by the support of expressions (E) have been introduced 
[11, pp.91-92]. These facilities can model among oth- 
ers the synchronous channels necessary to concurrent 
fimctional languages. Notice that these additional fa- 
ciUties can be translated into the minimal core of the 
Join-Calculus. 



v<E^;...;E„> 
def DinP 
P I P 




asynchronous message 
local definition 
parallel composition 
null process 
sequence 



let xi, .... Xm = E in P expression computation 

I return Ei, E^ to x synchronous return 

E ::= v{Ei; ...-jEn) synchronous call 

de f D in E local definition 

E; E sequence 

I let E synchronous call 

D::=J>P reaction rule 

I DAD definition conjunction 

I T null definition 

J ::= X <yi, ...,yn> message pattern 

I x{yi; ...■,yn) call pattern 

I J I J join of patterns 

Figure 1 . Enriched syntax for the Join-Calculus. 



Based on the syntax, the names are divided be- 
tween three sets: 1) the channels defined through a 
join definition (dv), 2) the names bound by a join- 
pattern (rv) and 3) the free names (fv). The inductive 
construction of these sets can be found in [11, p. 47]. 
In addition to the syntax, an operational semantic 
is mandatory to establish the computational model. 
The semantic is established by a Reflexive Chemical 
Abstract Machine (RCHAM) described by the rules 
from the Figure 2 [11, pp.56-62]. In particular, the 



reduction rule describes the system evolution after the 
resolution of an exchange of messages. The reduction 
only occurs if the exchanged messages satisfy the join- 
pattern of an existing definition: 
def x(^) > P in x{'y') — > P{'y' /^} where 
\Py /~z} is the name substitution. 



STR-JOIN 

STR-NULL 

STR-AND 

STR-NODEF 

STR-DEF 

RED 
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Substitution conditions: 

-STR-DEF: ajv substitutes the defined channels from dv[D] 
using freshly generated, distinct names. 
-RED: arv substitutes the transmitted messaged to the bound 
names from ri;[J]. 

Figure 2. Join-Calculus Operational semantic. 



2. Autonomous self-replication in virology 

The notion of self-replication is at the heart of 
computer virology since it is the common denominator 
between the different classes of viruses and worms. 
Referring to the early works of J. von Neuman [13], 
two fundamental concepts are mandatory for self- 
repUcation: a repUcation mechanism and the existence 
of a self-description also called self-reference. 

As corroborated by successive publications [14], 
[4], [5, Chpt.2-3],[6], self-rephcation proves to be 
directly linked to the concept of recursion being 
present in the different computation paradigms. In 
these different functional viral models, all Turing- 
equivalent according to the Church-Turing thesis [15], 
both the self-reference and the rephcation mechanism 
can be identified. Let us consider Definition 1 
extracted from [6]. This virus definition remains 
the most expressive and fiexible viral model which 
actually proves to be compatible with former ones. 
As a consequence of Kleene's recursion theorem [15], 
a virus is built as the solution of a fixed point equation. 

Definition 1: Using a Godel numbering, programs 
are indexed by integers and ipp{x) denotes the 
computation of the program indexed by p over the 
argument x. According to Bonfante, Kaczmarek and 
Marion, a virus z; is a program which, for all values 
of p and X over the computation domain D, satisfies 
the equation Lpy{p,x) = V/3(^,p)(a;) where /3 denotes 
the propagation method. 



In this definition, the concepts necessary to self- 
replication are explicitly defined. The replication 
mechanism is defined through the propagation function 
/?. As for the self-reference, it is denoted by the 
program v which is both considered as an executed 
program and a parameter for the propagation function 
whether it is on the left or the right side of the equation. 
The program p is called the target of the replication and 
the function /3 impUcitly contains a research routine for 
selecting a new vaUd target for the next repUcation. 
These different terms are important and must be kept 
in mind since they are reused all along the article. 

3. Distributed self-replication 

As underlined by M. Webster in its classification 
[16], self-replicating systems, and in particular viruses, 
do not necessarily contain their own self-reference 
access or their own replication mechanism. They often 
rely on external services to access these fundamental 
elements. Let us consider an interpreted virus in 
bash [5, Chpt.7]; the replication is achieved using 
commands provided by the language such as cp and 
the self-reference is accessed through $0. Therefore, 
the advantages offered by process calcuU in terms of 
modeling become undeniable: exchanges between the 
process and their environment, possible distribution 
of the computations. 

As seen in the previous section, for self-replication 
to be modeled functionally, the self-reference notion 
is required; so it is for process modeling. In order to 
self-reference themselves, programs must be built as 
process abstractions (definition with a single pattem): 
Dp = def p{arg) t> P where P is defined in function 
of the argument vector org. The program execution 
is therefore a process instantiation of the abstraction: 
Ep = def Dp in p{val). This hypothesis will be 
kept all along the article even if it is not explicitely 
recalled. Based on this hypothesis, self-replication 
becomes the emission of this definition on an external 
channel, this channel being the target of the repUcation. 

Definition 2: (SELF-REPLICATION) A program 
is said self-replicating over a channel c, where c 
is the repUcation target, if it can be modeled as a 
Join-Calculus definition capable to propagate itself 
(i.e. to extrude itself beyond its scope) on this 
channel. This definition can be translated as follows: 
def s{c,1l) > R where R -U-c(s)- ^c(s) is the barb 
predicate where the value transmitted over the channel 
c is no longer open to any name but restricted to v. 
In this definition, s denotes the self-reference whereas 



R specifies the replication mechanism over c. 

3.1. Modeling the environment 

Before speaking of any distribution of the repHca- 
tion, the execution environment in which processes 
evolve must be thoroughly defined. To draw a parallel 
with Cohen's model, a viral sequence must be consid- 
ered with respect to a defined Turing Machine. If left 
undefined, he actually proved in [17] that considering 
any sequence, a Turing Machine can be found for 
which this sequence is a virus. 

Process contexts prove to be useful tools to define 
execution environments. Let us consider that all exe- 
cution environments share an identical global structure 
that can be defined as a process context. Generally 
speaking, a running operating system, just like any 
other execution machine, provides different services 
(system calls) and resources (memory space, files, 
registry). Let us define a system context denoted 
Csys[-]suR where services and resources constitute the 
common bricks, formalized by channel definitions in 
the Join-Calculus: 

Services: In the Join-Calculus, the available services S 
can be modeled by definitions with a behavior similar 
to execution servers waiting for queries. The services 
itself will be represented by a function conveyed by the 
variable fgy When the service is called, the apphcation 
of fsv to the arguments is computed and sent back. 

• def Ssv (arg) > return fsv (org) in .... 

Resources: The set of resources R provide storing 
facilities accessible to processes. Resources can be 
modeled by parametric processes storing information 
inside internal channels. Resources can be either static 
providing reading and writing accesses only (data file, 
registry keys) or executable triggered on conmiand 
(programs). 

• Let us consider three simple variables c, Cnew, co- 
de f Rstatico) > 

def {write{cnew)\content<c>) t> 

{return to write\content<Cnew>) 
A {read{)\content<c>) > 

(return c to read\content<c>) 
in content<co>\return read, write to Rstat in ... 

• Let us consider three functions /, fnew> fo- 

def Rexecifa) I> 

def {write{fnew)\content<f>) t> 

[return to write\content<fnew>) 
A {read{)\content<f>) > 

[return f to read\content<f>) 
A [exec[arg) \ content<f > ) > 



[return f[arg) to exec\content<f>) 
in content <fo>\return read, write, exec to Rexec 

m ... 

This system context, split between services and 
resources, is compliant with the nowadays vision of 
computer, or more generically, with most execution 
environments. A process alone can not be infectious; 
it is viral only if the necessary services and resources 
to replicate are provided by the system as well as a 
potential external target. Considering this vision, the 
notion of virus can now be defined relatively to a 
system context by construction of the viral sets [17]. 

3.2. Construction of tlie viral sets 

Program replication is formalized by the emission 
of its definition on an external channel provided by 
the environment. Consequently, the barb predicate 
defined in the different process calculi is unadapted: 
transmitted values are omitted and once the program 
is placed inside a process context, reactions become 
internal and thus no longer observable by a barb 
predicate. We will thus define a new predicate, 
more adapted, that will be called valued-reaction. Its 
definition is given below. 

Definitions: (VALUED-REACTION) Let P, P' 
be two processes, x a channel and a a value from P 
(either bound or free). A valued-reaction PV^-ja^P' 
occurs if and only if P = C[x<a>]s for some process 
context C[.]s capturing x i.e. xGS.By reduction on 
join pattem x<a>, P — > P'. The Vxia) predicate 
syntactically checks the possibility for a process P to 
react on a channel x with the value a. Once resolved, 
the reaction leads to a new process state P'. 

Using valued-reaction, we can now define the 
principle of viable replication in a given environment. 
Viable rephcation guarantees that the replicated 
version of a program is still capable of self- 
replication. This principle was already present in 
the self-reproducing cellular automata from J. von 
Neuman where cellular configurations iteratively 
rebuild themselves at each transition [13]. Similarly, 
the replication is iterated by valued-reactions through 
two phenomenons: 

-Original replication: During the first execution 
of the program p, denoted by the process P, p is 
repUcated over a resource channel. This channel is 

consumed by the system context to evolve towards a 
new state. This is represented by the predicate: 

3X € R,C[P]suR^a^(p)P'. 



-Successive replications: The successive iterations of 
the replications are triggered by the activation of the 
intermediate infected resources. If P^^^ corresponds 
to the execution of the i*^ infected form, then, the 
following predicate should be verified: 

By definition, the viral sets contain the processes 
satisfying the viable self-rephcation principle. The 
notion of viral set from F. Cohen must thus be extended 
relatively to a system context which conditions the 
consumption of the rephcated definitions and the 
activation of the intermediate infected forms. In 
fact, these viral sets can be built following the same 
method of iterated rephcations. 

Definition 4: (VIRAL SET) Let us consider a sys- 
tem context Csys[-]suR where S denotes the available 
services and R the accesses to resources. Its viral set 
Ey can be recursively constructed as follows. 

Ev{Csyal]suR) = {V I 31r of size n > 2 

31c of size n>2 and exec of size n—1 such as 

Cays [V] SUH V„(y(,) V^o (v) C'sys \y'] SUR' 

and for all 1 < i < n, 

Cills[execi<argi >]sur(*'>'^ vivi)'^ ^iM 

} 

with the following constraints: 

- All a;j e "x denote rephcation targets. They can be 
either channels to existing resources: Xi G R, or to 
dynamically created resources: Xi G meaning that 
Xi e rv{J) where J is a join related to a resource 
definition D with dv{D) c R, 

- The messages execi < arg\ > activate the intermedi- 
ate infected resources. To avoid biases, they must not 
simulate viral activity : exeCiGR^"^^ and argi^bv{V). 

3.3. Distributed virus replication 

3.3.1. Environment refinement for replication. Con- 
sidering self-replication, several services and resources 
must be defined because they may be externalized by 
the self-reproducing systems [16]: access to the self- 
reference, replication mechanisms and replication tar- 
gets. The structure for services and resources, globally 
defined in the system context from Section 3.1, must 
thus be refined to support these features. The refined 
definitions are given below with relevant examples 
from current operating systems given in the Table 1: 
Self-reference access: Today's operating systems 
all handle a list of executing processes, with a 
particular pointer on the active process. This list 



is among others used for scheduhng. A service is 
often provided to access this list and in particular 
the pointed active process which denotes the 
self-reference. In order to maintain this fist, the 
program executions must be launched through a 
dedicated service. 

• Dproc = procexecip, orgs) > 
sysupdtiv) -return p{args) to procexec 

• Dref '= {sysupdt{r„ew)\current<rcur>) > 
current<rnew> 

A {sysrefO\current<rcur>) > 
{current<r^^j.>\'return r^ur to sySref) 
Self-reference access must be considered as a 
service even if it uses an internal resource. A 
solution is to publish only sySref and proCexec in 
S (from Csys[-]suR)- Any process placed in the 
context will not have direct writing access to the 
internal channel current storing the reference. 
From the process perspective, the two provided 
channels will be similar to services . 
Replication mechanism: The replication mecha- 
nism is a function r which copies data from an 
input channel towards and output channel. The 
function r has been deliberately left parametric 
for the model to remain generic. However r is 
strongly constrained to forward the input data 
towards the output channel after an indefinite 
number of transformations. 

• Drep = SySrep {in, OUt) [> 

return r{in,out) to sySrep. 
Replication targets: A pool of executable re- 
sources constitute the rephcations targets. These 
resources can be preexisting (infection) or created 
by the malware (duplication). 

• Dtarg — Rtargi^finit) t> 

def {write{fnew)\content<f>) t> 

{return to write\content<f„ew>) 

A {read{)\content<f>) t> 

{return f to read\content<f>) 

A {exec{arg)\content<f>) t> 

{return proCexec{f, org) to exec\content<f>) in 

content<finit>\return read, write, exec to Rtarg 

Using the previous definition, a system with n re- 
sources can be defined as an evaluation context. This 
context being enough generic to apphed to the majority 
of existing systems, we will consider this system con- 
text all along this section for the different definitions 
and proofs: 

Csys[-]SUR = def Dproc A Dref A Drep A Dtarg in 

let sri,swi,sei, sr„, sw„, se„ = 

Rtarg{fi), Rtarg{fn) in {current<null> I [.]) 

where: 



S = {procexec, sySref, sySrep} and i? = {sf, sw, se} 



I Services pixnidcd b\' well-known operating systems 



Channels 


Linux APIs 


Windows APIs 


pVOCexec 


fork( ), exec( ) 


CreateProcess( ) 


SySref 


getpid( ), 


GetCurrentProcess( ), 


readlink( ) 


GetModuleFileName( ) 


SySrep 


sendfile( ) 


CopyFile( ) 


Resources 


fread( ), 


ReadFile( ), 


Accesses 


fwrite( ), 


WriteFile( ), 



Table 1 . Parallel between refined channels and 
equivalent OS services and resource accesses. 



3.3.2. Classes of self-replicating viruses. Using this 
refined system context, the four classes of self- 
replicating viruses from M. Webster [16] can be de- 
fined in this process-based model. Through these four 
classes, the important components required for au- 
tonomous replication can be found (see Section 2): the 
access to the self-reference, a replication mechanism 
denoted by the function r and a target research routine 
denoted by the function t. These two last functions 
have been willingly left parameterizable. 

Through parametrization, several types of replica- 
tion can be supported, for example: (1) overwriting 
infections which can be defined by def r{v, sw)!>sw(v), 
(2) append infections (respectively prepend infections) 
with a definition of the form def r{v, sw, sr) > {let p = 
sr{) in def pi{arg) > v{).p{arg) in sw(pi)), (3) 
companion infections described in a coming section 
because of their greater modeling complexity. 

With regards to the concept of self-replication 
from Definition 2, the virus case is particular since 
the target of the replication is no longer passed 
as a parameter but chosen by an internal research 
routine. The behavior of this routine, just like the 
repUcation mechanism, must remain parameterizable. 
Generally speaking, successive replications follow 
three main schemes: (1) targets are hard-coded in 
the virus, like a predefined file path for example, 
meaning that the target channel will always be 
the same n such as def t{) > return n to t, 

(2) target resources are dynamically created by 
the routine using the facilities of the system 
def t{) > let sr, sw, se = R{empty) in return sw to t, 

(3) target are discovered by miming through the 
system searching for vulnerable resources. Directory 
exploration is a typical example. Once again this 
example is too complex to be briefly described here. 
The target research must be integrated in the virus 
definition, in addition to the self-reference access and 
the repUcation mechanism. Based on this parametric 



approach, as well as on the provided modehng of the 
system context, four main classes of viruses can be 
defined according to the exported services. 

Definition 5: Let V he a viral process. Let R and 
5* be the definition of sub-processes responsible for 
the self-reference access and and the replication mech- 
anism. An additional definition T is responsible for 
researching the target of the infection. At last, a process 
P is introduced for the post-infection process i.e. the 
payload: 

, ii loCrep{in,out) > retum r{in,out) to locrep 
where r is a parametric function defining the 
replication mechanism. 

• S loCrefi) ^ return V to loCref. 

• T loCtargQ > retUTTl t{) to loCrep WhcrC t IS a 

parametric function defining the routine for target 
research. 

• P is any process modeling a payload. 

Four classes of worms can be defined using these 
primitives and the system services: 

• (Class I) V is totally autonomous: 

Vi = de/„ vC^) (de/„ SaRaT 

in loCrepiloCrefOJoCtargO)-P) in prOCexec{v ,11) 

• (Class n) V uses an external repUcation mecha- 
nism provided by the system: 

Vii = defy v{~x) > {defy SAT 

in SySrep{l0Cref{), loCtargQ) -P) in prOCexec{v, "o ) 

• (Class III) V uses an extemal access to the self- 
reference provided by the system: 

Viii = defy v{'x) > {def RAT 

in l0Crep{sySref{), loCtarg{))P) in prOCexec{v, "o ) 

• (Class IV) V uses only external services: 
Viv == defy v{'x) > {def T 

in SySrep{sySref{), l0Ctarg{))-P) in prOCexec{v, O*) 

In this definition, the research routine T is always 
internal to the virus. However, the definition would 
support the distribution of this functionaUty. This case 
has not been included in the definition because, to our 
knowledge, no malware completely externalize this 
functionaUty. On the other hand, since it runs through 
the envirormient, the target research is likely to use 
intensively the system services. 

Proposition 1: If the system context Cays[-]suR 
provides the right services and valid targets, the four 
virus classes Vi,Vii,Viii and Viv achieve viable 
repUcation i.e. these classes are included in the viral 
set E^{Csys[]suR)- 

Proof: Let us consider a system context with 



several potential resources as defined in this section. 
Let us consider a simple case of parameterization for 
the replication mechanism r and the target research 
t. Notice that other definitions could be used without 
modifying the core of the proof: additional reductions 
would only be necessary. 

def r(x, w) t> 'w{x) 

def t() > return swi to t at the i**" iteration. 

Let us consider the case of third class of virus with 
the following notations: 

Dviii = vO> 

{def Rat in loCrep {sySref , loCtarg Q);P)- 

Dr^ {swkifnew)\contentk<f>) > 

(contentk<fnew>) 
A {srk{)\contentk<}>) > 

{contentk<f>\return f to srk) 
A (sek{arg)\contentk<f>) > 

{contentk<f>\return procexecif, org) to sek). 

To prove viable replication, it must be proven that 
the viral function v initially infect a resource, but 
also that an execution request se\(a\) reproduces the 
infection towards a second writing channel sw2- Next 
iterations can then be reduced to these two cases: 



Initial infection: 

Successive infections: 

C'ays [se\ (Ol )] SUH V^O VsTO2 (p) ^aya [P\ SUR- 




h Csya[VlIl]suR 

^ (str-def+str-and) 




Dproc^ Dref^ Drep^ Dtarg ^ 

let sri, swi.sei, .... sr^, swn, se„ = Rtargifl), 

in {ci! rrctit<'nifll> Vjjj) 
• (rcact+sii-dcf+Nlr-and) 


Htargifn.) 


Dproc^ Dref} Drep^ Dtarg^ ^ Rl ' ^ Rn 

contenti<fi> \ Tl'^_2Contenti<fi> \ 
current<null> \defv Dy^^j in proCexec{v, a) 
> (str-def) 


Dproc, Dj.f.j,Drep, Dtarg, Dr.^ , ...,Dr^ , Dy^ j j 

contenti<fi> \ Tl"_2Contenti<fi> \ 
current<null> \ procexec('v,~a) 
> (react) 




Dproc, Dj.gf, Drep, Dfarg, Dr^ , Dr^ , Dy^^j 

contenti<fi> \ Tlf_2Contenti<fi> \ 
current<null> \ sySupdt{'v).v{'S) 
— * (react) 




Dproc, Drcf, Drep, Dtarg, Dr^ , Dr^ , Dy^^j ^{f } 

contenti<fi> \ Try_2Contenti<fi> \ current<v> \ v{~a) 
— > (react) 


Dproc, Dref, Drep, Dtarg, Dr-^ , Dr^ , Dy^ ^ j ^ {v} 

contenti<fi> \ tl"_2Contenti<fi> \ current<v> \ 

defR AT in loCrep(sySref{),loCtarg()).P 

^ (str-def+str-and) 



Dproc, Drcf , Drep, Dtarg, Dr.^ , Dr^ , Dyjjj , i?, T 



contenti<fi> \ Ii"_2Contenti<fi> \ current<v> \ 

loCrep {sySrcf {) , loCtarg ()) -P 

— > (react) 

Dproc, Dref, Drep , Dtarg , D R^ , Dr^, Dy^^^ , R, T h^^j 

contenti<fi> \ n"^2Cowtenti</j> | current<v> \ 

loCrep{v,loCtargO).P 

— * (react) 

Dproc, Dref, Drep, Dtarg, Dr^ , D R^ , Dy^^^ , R, T 

contenti<fi> \ Tl"_2Contenti<fi> \ current<v> \ 

loCrepiv, SWl).P 

— > (react) 

Dproc, Dref , Drep, Dtarg, Dr-^ , Dr^ , -Dv/// i ^^i T ^ {v} 

contenti<fi> \ Ii"_2Contenti<fi> \ current<v> \ 

swi{v).P 

— > (react) 

Dproc, Dref, Drep, Dtarg, Dr^ , D R^ , -Dvj/j i R, T ^ {v} 

contenti<v> \ Tl"_2Contenti<fi> \ current<v> \ P 

Once the initial replication is achieved, the second 
replication is activated from the current state thanks 
to an execution request sei(ai). 

Dproc, Dref, Drep, Dtarg, Dr^ , ...,Dr^ , Dy^j^ , R, T ^ {v} 

contenti<v> \ content2<f2> \ Tl."—^contenti<fi> \ 
current<v> \ sei(ai) 
— * (react) 

Dproc, Dref, Drep, Dtarg, D R^ , Dr^ , Dy^ j ^ , R, T ^ {v} 

contenti<v> \ content2<f2> \ ^^—^contenti<fi> \ 
current<v> | proCexec{v , ai) 

From there the reduction is identical to the previous 
one except for the call to loctarg which is reduced to 
SW2 and no longer swi. 



Dref , Drep, Dtarg, D R-^ , D R^ , Dy^j^ , R, T, R',T' hj^j 

contenti<v> \ content2<v> \ Ii.'^_^contenti<fi> \ 
current<v> \ P 

These two reduction prove the viable replication for 
viruses of the class Vm. An identical approach can be 
used to provide proofs for the remaining classes. □ 

3.4. Distributed worm propagation 

The propagation mechanism for worms is similar 
to virus replication. The difference Ues in the scope 
of the extrusion: the abstract definition of the worm 
is no longer extruded to a local resource through 
a writing channel, but to a remote system context. 
This topology can be defined as contexts imbricated 
on two levels. A first local system context, similar 
to the one from the Section 3.3, is included into a 
global architectural context containing parallel remote 



systems and communications facilities between them 
(a computer network topology for example): 

Local context: Let us define a new propagation 

service in the local context. The principle of the 
propagation service is similar to replication (the 
propagation function p replaces the funtion r). This 
new local context can be simplified by removing the 
resource definitions used to store the replicated code: 

Dprop == syspropiin, out) > return p{in, out) to sySprop 
Cisys = def Dp 

roc A Dref /\ DprQp 

in (current <null> \ [.]) 

Remote context: The remote context must provide 
communication facilities between the different 

systems. The ComChannel definition enables the 
generation of two-way communication channels. 
Processing of the data transmitted by the local context 
is delegated to the remote parallel contexts running 
inside the global architecture. In order to simplify the 
model, the definition below only considers a single 
process Prsys modeling the remote system but several 
systems can run in parallel. In addition, the resources 
and services from Prsys 

can also be refined: 

Prsys — let d — TCv(^) ITl Pprocessing 

Cgarch *= def ComChannel{) > 

def send<m>\receive{) > return m to receive in 
return send, receive in let sd, rcv = ComChannel{) 

in [Prsys I [•] ] 

Definition 6: Let be a worm able to propagate to 
remote system using P, S et T, the definitions of three 
sub-processes respectively responsible for propagation 
(pending of the rephcation for viruses), access to the 
self-reference and the research of a potential target: 

, P locprop{in, out) > return p{in, out) to locprop 

• S "= loCrefO ^ return w to loCref 

. T loCtargi) > VCtUm t{) tO loCtarg 

Four classes of worms can be defined using these 
primitives and the system services: 

• (Class I) W is totally autonomous: 

Wi = def w{~x) {def SAPATin 

l0Cprop{l0Cref{), loCtargi)) -P') in prOCexec{w , it) 

• (Class 11) W uses an external propagation mech- 
anism provided by the system: 

Wn = def w(lc)'> {def S AT in 

SySprop{l0Cref{), loCtarg {)) ■ P') in prOCexec{w, ~a) 

• (Class III) W uses an external access to the self- 
reference provided by the system: 

Win = def uflr) t> {def PAT in 

loCprop {sySref {) , loCtarg ())•-?') W PrOCexec{w, ll) 

• (Class rV) W uses only external services: 



Wiv def ) > {def T in 

SySprop {sySref () , loCtarg {)).P') in prOCexec{w, "o ) 

The four classes of worms satisfy viable replication 
just like viruses do. The main difference comes from 
the extrusion of the w definition which is no longer 
bound to the local system but can be extended to the 
remote context. 

Remark 1: Just like replication, the propagation 
function can be refined for more complexity. The 
simplest case remains the simple copy: 
Dprop def p{in, out) > out <in> 
For more complex cases such as Email-worms, inter- 
mediate functions can be introduced with their coun- 
terparts in the remote system to reverse the processing: 
Dprop '= def p{in, out) > 
out <concat{S MT Pheader, hase6A{in))> 
Prsys == let d = rcv{) in base64decode{body{d)) 
The research routine t{) can be defined accordingly to 
parse the address books of different mail clients. 

4. Modeling complex malicious behaviors 

Modeling complex behaviors proves the interest of 
the parametric approach. This section gives examples 
of complex refinements both for the replication func- 
tion r and the pay load process P from the previous 
section. 

4.1. Companion viruses 

Companion viruses remain a particular case of 
the parametric definition of the Section 3.3. Their 
specificity Ues in their replication mechanism: instead 
of overwriting or modifying the content of the 
resource targeted by the infection, the virus replaces 
this resource from the system perspective. Companion 
viruses can be divided between two classes whether 
the replacement is achieved (a) by diverting the file 
system naming mechanism or (b) by diverting the 
hierarchy of execution [5, Chpt.8]. The replication 
function is consequently more complex and requires 
three steps: 

1-a) Renaming or relocation the target of the infection. 
1-b) Modification of the system hierarchy of execution. 

2) Creation of a new resource under the target name. 

3) Copy of the viral code in the replacing resource. 

Modeling the file system 

In order to model a companion virus, it becomes 
necessary to introduce a refined model for the file 
system. The purpose of the file system is to associate 



a resource name (a system path) with a given location 
and access channels (reading, writing, execution). 
The principle is thus compatible with our model 
of executable resources. In addition, a file system 
is introduced into the environment defined in 3.3 
which is responsible for maintaining a list of 4-tuples 
associated to the different files. Let us give a first 
definition of a file entry as well as its access and 
update methods: 

EfS == E{ni„it, Srinit, SWinit, Seinit) > 

def ni„it{c,p) \entry<sr, sw, se> > 
{if [c = dl] then else 
if [c = rnv] then E(p, sr, sw, se) else 
if [c = ex] then se{p) \ entry<sr, sw, se> else 
if [c = rd\ then p{sr{)) | entryKsr, sw, se> else 
if [c = wr] then sw{p) \ entryKsr, sw, se>) 
in entry<srinit, swinit, seinit> 
The file system provides different commands to 
manage these entries. The different command takes the 
file name in input, and the file system is responsible 
for executing these commands on the right resource 
(which are basically modeled as executing processes): 
-new to create new files, 
-delete to delete existing files, 

-move to modify the name of the file (modifying 
the name only corresponds to a renaming operation 
whereas modifying the complete path is a relocation), 
-execute to execute a given file, 
-read to read from a given file, 
-write to write to a given file. 

These commands of the file system are modeled 
as definitions whereas the entries of the file system 
constitute a set of parallel processes. A file system 
definition is given below where the executing parallel 
processes correspond to the already existing files 
referred by the name vector "n : 

Mfs == def Efs in 

def new{n„eu,) > E{nnew, Rexec{null)) 
/\ delete{ndei) > ndeiidl,null) 
A move{noid,nnew) > noid{mv,n„ew) 
A execute{nexe, arg) > nexe{ex,arg) 
A read{nr d, buffer, arg) > nrd{rd,buf fer) 
A write{nmr , data) > n,j,r{wr,data) 
in n„.g7j(de/ ni{c,p) \entryi<sr, sw, se> > 
(if [c = dl] then else 
if [c = ni'u] then E(p, sr, sw, se) else 
if [c = ex] then se{p) | entryi<sr, sw, se> else 
if [c = rd] then p{sr{)) \ entryi<sr, sw, se> else 
if [c = wr] then sw{p) \ entry i<sr, sw, se>) 
in entryi<sri, swi, sei>) 

Modeling the hierarchy of execution 



The hierarchy of execution may vary from 
an operating system to an other, this introduces 
portability issues explaining that companion viruses 
gaining preemptive by modifying the hierarchy of 
execution are not very common [5, Chpt.8]. The 
most common case are companion viruses modifying 
the path variable in a Unix environment. An other 
example, a little bit outdated, concerns the DOS 
architecture where executable files with the .com 
extension are preemptive on those with the .exe 
extension. In fact, the hierarchy of execution relies on 
a shorter designation of programs (path or extension 
missing). These short designations are completed 
according to the hierarchy of execution. Let us first 
define a concatenation operator over names denoted 
ni • 712 and a projection operator 7r„ to recover the 
n*'* concatenated element. A process of completion 
must then be defined which is parametric over a list 
of complements (file path or extension), ordered by 
increasing preemptiveness: 

Hex — 

complete{sn) ] complist<co, Cn> > 
let Ino, Inn = sn ■ co, sn ■ Cn in 
{if [InoSdv] then return Ino | complistKco, Cn> 
else ... else 

if [Inn&dv] then return Inn | complist<co, Cn>) 
A {preempt{c) | complist<co, ...,Cn> > 
complist<c, Co, Cn-i> 

The execution command from the file system must 
be modified adequately to try name completion when 
the name of the program launched in execution is 
unknown from the system. In other words when the 
program name is not in the set of defined names. 

Mfs = def Eps in 

def new{nnew) > E{nnew, Rexec{null)) 

A execute{nexe,arg) > 

if [nexec&dv] then nexe{ex,arg) 
else execute{cornplete{nsxec), arg) 



Refining replication for companion viruses 

From the Definition 5, the two classes of companion 
viruses can be obtained by refining the replication 
function r. Using this definition of a file system, a 
first companion virus V diverting the file naming 
mechanism can be defined as follows: 

def r{v,ntarg) > 

move{ntarg,ncopy);new{ntarg);write{ntarg,v) in ... 

The second class of companion viruses relies on 
the file system refining to support the execution 



hierarchy. Let us consider the target of the repUcation 
as a concatenated name Iritarg = sritarg ■ ^xt. The 
preemptive companion virus can be defined as follows: 

def r(v,lntarg,ext) > 

preempt{extnew); new{7vi{lntarg) ■ extnew); 
write{7ri{lntarg) ■ extnew, v) in ... 

Model validation 

In order to vaUdate the model, it is necessary to 
assess its relevance with regards to existing compan- 
ion viruses. A parallel has thus been drawn between 
the different processes and definitions, and their real 
implementation. A recent example of MacOS X virus 
circumventing the file naming mechanism has first 
been taken. The results are given in the Table 2. The 
same work has been done for a second companion 
virus for Unix, diverting the execution hierarchy. The 
results are given in the Table 3. 

4.2. Stealth techniques inside Rootkits 

Up until now, the article was only focusing on 
modeUng self-replication since it is one of the main 
characteristics of malware and in particular viruses. 
In fact, the Join-Calculus is sufficiently expressive to 
describe other malicious behaviors such as stealth. 
Even if stealth is not a mahcious technique on its 
own, deployed in rookits, it becomes a powerful tool 
for attackers. Unfortunately, few formal works have 
been led on rootkit modeling [19], [8], [9]. Rootkits 
thus constitute an interesting choice to assess the 
expressiveness of the model, by proving it can be 
applied to concrete cases. 

This section describes how rootkit behaviors can 
be defined in the parametric model by refinement 
of the payload process which had not been detailed 
yet. Let us consider a piece of malware loading a 
rootkit from its body. Based on recursive functions, 
the definition published by Zuo and Zhou of viruses 
resident relatively to a system call is the closest 
result to our approach [19]. Unfortunately, recursive 
functions are not really adapted to model reactive, 
persistent (non-terminating) programs such as rootkits. 
The Join-Calculus should offer far more flexibility. 

Services provided by the rootkit 

Basically, a rootkit provides through a command 
channel a certain number of services to the attacker. 
Let us first define n processes Si, ...,Sn corresponding 
to these services. A public channel com, is provided 
to the attacker (through the network, based on 
various protocols such as IRC or P2P for the most 



spread). This channel supports n different types of 
requests represented by the vector ~c = ci...c„. The 
names Cj themselves correspond to internal command 
channels, which, in the case of rootkits, are often 
communication channels from the user space where 
the client part is running, towards the services running 
in the kernel space. A service of proxy relays the 
connmands received on the pubUc channels towards 
the intemal channels. This client-server architecture 
can be defined as follows. In the first place, a public 
conmiimication channel com must be defined between 
the attacker A and the rootkit Rku- 

Doom def com{)> 

{def send <m>|recewe() > return m to receive in 

return send, receive to com) in 

let sd,rcv = com{) in {A\Rkit) 
In first place, the rootkit publishes the hst of 
supported commands through the public channel. 
Once transmitted, it launches the proxy service 
waiting for requests from the attacker: 

Pproxy == let c, arg = rcv{) in c{arg) 

Rkit = def Cl{) > Sl\Pproxy 

A C2{arg)>S2\ 
A ... 

/\ C-ni^O/VQ^ Sn\Pproxy sd'^ C 'Pproxy 

In parallel, the attacker receives the available 
cormnands for the different services on the public 
channel. The obtained list is stored as the vector ~s. 
He can then activate any service by sending a request 
containing the corresponding command: 

A let ~s = rcv() in sd<si , argi> .sd<S2, arg2>... 

Loading the rootkit 

Before installation, the rootkit is often stored in the 
malware body as an intemal component. It must thus 
be extruded and loaded either conventionally through 
the driver manager or through a diverted mean. In 
both cases a specific loading process is required. 
Let us consider the conventional loading process by 
defining a service of driver manager. This service 
basically receives the driver definition and launches 
its execution: 

Dmdriv "= load{d) > do 

In order to be accessible inside the malware, the 
rootkit must be abstracted to ease the loading: 

M = {...); def ro >Rkit in load<r> \M' 

The following loading process is obtained: 

def Dmdriv in M — >* def Gdr in M'\Rkit 

System call hooking 



Companion Virus for Mac-0 Executables ([18], 2007) 
Platform: Mac OS X 

Type: Companion virus based on the directory structure of Mac-0 executables 


Processes 


Implementation 


Mps 


MacOS X file system with the Mac-0 executable structure in repositories: hierarchical tree and meta-information files. 


Eps 


Info.plist containing information on tlic executable structure and tlie location of its elements. 


Channels 


Implementation 


^targ 


The C F BundledExecutable field from Info.plist which denotes the real executable, the target of the infection. 


move 


The cp coimnand from the console. 


create, write 


The two commands are not detached and realized by a single call to the command cp. 



Table 2. Parallel with a Companion Virus for MacOS X based on file naming. 



vcomp_ex_vl ([5, Chpt.8],2005) 


Platform: Unix 




Type: Companion virus modifying cn\'ironmcnt \'ariablcs for precmpti\'eness 


Processes 


Implementation 


Mfs 


Unix file system. 


Fps 


Inode entries for the existing files. 


Hpx 


The PATH environment \ariablc. 


Channels 


Implementation 


^targ 


An absolute file name composed of the short file name and its path. 


preempt 


The command export PATH = NEW_PATH : PATH. 


create, write 


The standard file API fopen and f write. 



Table 3. Parallel with a Companion Virus for Unix based on execution hierarchy. 



At last, it is necessary to model the hooking 
mechanism just Uke resident viruses in [19]. Before, 
a new entity of the system must be defined: the 
system call table which is considered as a resource. 
This entity only publish the list of available system 
calls on-demand. This list is modeled by a vector of 
channel sc which can only be modified by the kernel 
through a privileged writing access. This privileged 
access is modified by the hook channel which from 
the malware perspective is considered as private: only 
the publish channel is returned at table creation: 

Disc — Tsc(,tinit^ ^ 

def {publishQ \ table< t >) > 

[return t to publish \ table< t >) 

A {hook{tnew) 1 table< t >) > {table<tnew>) 
in table<ti„it> \ return publish to T^c 

To access to this privileged channel, the rootkit uses 
in a diverted way the system services and in particular 
the services of memory allocation. Allocation services 
can be used to modify the page protection of a 
memory space (loAllocateMdl under Windows [20, 
pp. 82-87] and Kmalloc under Linux [21]). Generally 
speaking, allocation services take as input a base 
address b and a size s and return the result of the 
allocation. The hook channel is only leaked if the 
base address is equal to the address of the system 
call table scbase. In any other case a simple acces is 
returned: 



Daiioc = alloc{b, s) > 

if \f3=scbase\ then return hook else return access 

The interest of hooking for the rootkit is to define a 
set of false system calls Rfsd-, •••)-R/scm> in order to 
hide files or processes, for example by filtering the 
original system calls. These malicious system calls 
are registered in a new table which is a vector of m 
entries fsc — fsci...fsCm containing their referring 
names: 

Dfsc = def fsci (org) > Rf ad 
A ... 

A fscm{arg) > Rf scm 

def 

Rkit = def Dfsc in 

let scspace = alloc(scbase, scsize) in scspace{fsc) 

The system evolves along the following derivation 
where the leak of the privileged writing channel is 
observed from the allocation mechanism: 

def Disc A Daiioc in let pub = Tadsc) in Rku — > * 

def Disc A Daiioc A D/sc in table<fsc> 



Model validation 

In order to validate the model, it is necessary to 
assess its relevance with regards to existing rootkits. 
A parallel has thus been drawn between the different 
processes and definitions, and their real implementa- 
tion in different malware. The results are given in the 
Tables 4, 5 and 6. 



Suckit ([211, [221,2001) 
Platform: Linux 

Type: kernel space, system call hooking 


Processes 


Implementation 


M 


sk, executable responsible for the rootkit installation from user space. 


Rkit 


core, kernel module embedded in sk to be loaded; it contains the provided services S„. 


Pproxy 


backdoor, autonomous thread waiting for network requests. 


^mdriv 


internal module of sk responsible for allocating kernel memory, for writing the core module, 
and for resolving the addresses normally addressed by the insmod command. 


Disc 


Linux system call table. 


Dalloc 


memory device j dxtv 1 huicrn . 


Rsc 


hooked versions of the system calls fork. <>i>( ii. n uil. kill. ... 


Channels 


Implementation 


com(sd, rev) 


established socket between the attacker and the backdoor thread. 


c 


hooked version of the olduname system call (kept for compability) allowing communication 
between the backdoor thread and the kernel module core to transmit the different commands. 


load 


calls to internal functions of D^^^^iy. 


alloc 


kmalloc. 


hook 


write function called with the address returned by kmalloc. 


publish 


sysenter instruction allowing the switch between user and kernel space according to the system call table. 


fsc 


calls to hooked functions through the replaced system call table. 



Table 4. Parallel with a Linux Kernel Rootkit: Suckit. 



Agony (Sources available on the net by Intox7, 2006) 
Platform: Windows 


Type: kernel space, system call hooking 


Processes 


Implementation 


M 


agony.exe, executable responsible for the rootkit installation from user space and for transmitting the commands. 




agony. sys, kernel module embedded as a resource in agony.exe. Once loaded, it contains the different services Sn. 


Pproxy 


agony.exe transmits the keyboard input to the driver. 


P^mdriv 


Windows Driver Manager called SCM {ServiceC ontrolM anager) . 


Disc 


SSDT Table (SystemServiceDescriptorTahle) containg the adresses of the Windows system calls. 


P^ alloc 


Memory allocation services. 


Rsc 


hooked versions of the system calls defined in the kernel module: 
ZwQuerySystcniJ ii foi-nial ion Hook. Z u'C^iic rijL)ir(.olorijI''iU Hook... 


Channels 


Implementation 


com 


Keyboard interface with the console application Agony.exe. 


c 


DevicelOControl, a Windows system call used to communicate with drivers. 


load 


Call to CreateService followed by StartService. 


alloc 


MmCreateMdl now replaced by loAllocateMdl. 


hook 


Writing operation to the space newly allocated. 


publish 


.sysenter instruction allowing the switch between user and kernel space according to the system call table. 


fsc 


Adresses in memory of the new system calls defined in the kernel module. 



Table 5. Parallel with a Windows Kernel Rootkit: Agony. 



AgoBot ([231, first version in 2002) 
Platform: Windows 

Tyiie: user s|-)acc. hooking not supported 


Processues 


Implementation 


Ai 


AguboL, originally a P2P worm, supporting in prior ver.sions propagation through vulnerabilities. 


Rkit 


CBot, C++ object defining the different services Sn as well as their handlers. 


Pproxy 


CIrc, objet C++ reponsable de la communication par IRC avec I'attaquant 


o.,, 


Clristdllcr. C++ object responsible for copying the code and registering in the system (registry key). 


Channels 


Implementation 


com 


IRC communication established through the network. 


c 


call to the method HandleCommand from the object CBot 


load 


calls to the methods CopyToS ysDir and RegSartAdd from the object CInstaller 


Table 6. Parallel with a Windows User Rootkit: Agobot. 



5. System resilience / replication detection 

Modeling facilities are not the only interest of the 
process algebra. Since the first formal works from 
Cohen, it is well established that virus detection is an 
undecidable problem. However, thanks to this formal- 
ism, we will now try to identify some fragments of the 
Join-Calculus for which the detection problem remains 
decidable up to a complexity factor Let us consider an 
algorithm taking as input a system context Csys[ ■]sur 
and a process P abstracted by the definition p. This 
algorithm returns true if P is able to self-replicate 
inside the context. 

Such an algorithm can be used either for checking 
the process replication capability or assessing the con- 
text resilience to a viral class. An exhaustive procedure 
is described in the Algorithm 1. The purpose of this 
algorithm respectively changes whether the context or 
the tested process varies: 
Detection: Malware detection can be addressed 
by identifying replication attempts of various 
processes in a fixed system context. 
Resilience: Just like in any other domain of com- 
puter security, system resilience is addressed 
y confronting systems to different a known 
attack class. This problem can be addressed 
by identifying replication attempts of a given 
viral class in various system contexts. The 
viral class is defined through a fixed process 
in input, which is known to be a malware. 



Algorithm 1 Replication detection. 
Require: P which is abstracted by p 
Require: Caya [ ■ ]sur where S is the set of services and R 
the resources 
1: Ed 

one ^ 0, Enext ^ 0, C < Csys[P]suR 

2: repeat 

3: Esucc {C \C — > C } 

4: if 3C" reached by a join pattern x<p> with x € R 

or a; (dv(P) U S U R) then 
5: return system is vulnerable to the replication of P 
6: end if 

V: Edone ^ Edone U {C} 

8: Eaucc * Eaucc\ {Cd^ Esy,cc\3Ct^ E^one-Cd = C't} 
9l Ejtext ^ Enext U Eaucc 

10: if infinite reaction on a join without apparition of new 

potential transitions then 
11: break 
12: end if 

13: Choose a new C € E„ext 

14: until E„ext ^ 

15: return system is not vulnerable to the replication of P 



Proposition 2: Detection of self-replication in the 
Join-Calculus is undecidable. 



Proposition 3: Detection of self-replication is 
decidable if the system context and the process are 
defined in the fragment of the Join-Calculus without 
name generation. 

Algorithm 1 uses a brute-force approach for state 
exploration. As a matter of fact, it was not designed for 
operational deployment but to study the decidability 
of the detection problem. Without surprise, detection 
remains undecidable according to Proposition 2. 
However, according to Proposition 3, the problem 
can become decidable by restricting name generation. 
This restriction is not without impact on the system 
context. Forbidding name generation induces a fixed 
number of resources without possibiUty to dynamically 
create new ones. But most importantly, without name 
generation, synchronous communication is no longer 
possible, in particular for services which can not 
generate fresh names to return values. Unique and 
fixed return channels must be specified instead. 

Proof: In the algorithm, the set of states 
Esucc reached after a single reduction is finite 
because only internal transitions r are considered. 
Internal transitions in join-calculus are finite state 
branching [24]. The decidability thus depends on the 
bounded number of iterations (finite number of states 
potentially reached and infinite loop detection). To 
prove the decidability, we will reduce the detection 
problem to the coverability problem in petri nets. 

Let us consider the fragment of the join-calculus 
without name generation i.e. no nested definitions 
of the form def J t> {def J' > P' in P) in Q. 
This fragment can be encoded in the asynchronous 
TT-calculus without external choice. Let us consider 
a similar encoding to [25] except that the replication 
operator has been replaced by recursive equations in 
order to be consistent with the remaining of the proof: 

[[Q\R]h = [[Q]h I [[R]]j 

[[x<V>]]j = XV 

[[def x<u> I y <v> >Q in R]]j = 

( x{u).y{v).{[[Q]]^ I A) \ 

I A I [[R]h j 
Name generation being excluded and the process 
being considered in a close context, the scope re- 
striction is absent from the encoding. We will now 
reuse the approach in [26] to reduce the problem. 
Using the provided encoding, the process inside its 
system context can be encoded in the asynchronous tt- 
calculus, resulting in a system of parametric equations 
satisfying the normalized form from [26]. 

This system is then encoded into equations from 
the Calculus of Coimnunicating Systems (CCS). CCS 



is parameterless, however, without name generation, 
channel a and possible transmitted value a can be 
combined in a single channel <a, a>. Notice that this 
encoding reintroduces the external choice + to handle 
the combined channels. Just like in [26], the obtained 
equation system thus contains a set of parallel pro- 
cesses guarded by these channels. The only differences 
lie in the multiple join patterns in join-calculus which 
results in multiple channels guarding these processes: 

Ai = T, <a, a> . <a' , a'> .(H <a,a> \ H Aj) 

In this equation system, the replication detection 
is reduced to the problem of knowing if one of the 
guarded process Ai is activated over a channel <a, p> 
with a G R and p is the abstraction of P. This is 
typically a control reachability problem in CCS. It has 
been proven in [26] that control reachability can finally 
be reduced to the coverabiUty problem in petri nets. 
Although it is time and space consuming, there exist 
decidable algorithms computing coverability [27] and 
thus able to detect any token in the ap place, referring 
to the emission of the process definition on the a the 
channel. □ 

6. Policies to prevent malware propagation 

The previous section deals with the problem of 
malware detection through their self-repUcation char- 
acteristic. It has been proven that detection was de- 
cidable only under certain assumptions. The problems 
concerning decidability and the fact that detection is 
reactive and not proactive encourage the research of 
alternative solutions to fight against malware. It is thus 
important to consider other proactive approaches such 
as the prevention of malware propagation. This section 
first describes the malware propagation as an illegal 
information flow and then envisage different solutions 
for malware containment. 

6.1. Non-infection property and isolation 

A different approach to fight back the threat brought 
by malware is to reason in terms of information flow 
as initiated by F. Cohen in [28]. Active research 
works are currently led in order to control illicit 
data flows between processes of different security 
levels [29], [30], [31]. One of the main result is flie 
formalization of the non-interference property which 
specifies that the behavior of a low-level process 
must not be influenced by an upper-level process. 
This non-interference property is used for addressing 
confidentiality issues. 

Similarly, the replication process of malware can 



be compared to an illicit information flow of the viral 
code towards the system. Let us state the hypothesis 
that, contrary to malware, legitimate programs should 
not interfere with other processes implicitly through 
the system. This is a typically an integrity issue, 
and the non-interference property must be adapted 
accordingly. We have thus defined in Theorem 1 a 
new property called non-infection in reference to the 
original property of non-interference [29]. 

Theorem 1: (NON-INFECTION). Let us consider 
a process P placed into a system context considered 
stable (i.e. potential reactions to intrusions only). The 
property of non-infection is satisfied by P if the system 
evolves along the reaction Cgys[P] — >* C'^ysiP'], and 
for any non-infecting test process T the equivalence 
Csys[T] « C^yjr] is true. The strength of the 
property is determined by the equivalence considered. 

The non-infection property guarantees the integrity 
of the system context. With regards to this property, 
the consequent question is to know what are the 
mandatory constraints for a system context to satisfy 
non-infection. The Proposition 4 states that there 
exist systems preventing replication through resource 
isolation. This proposition in fact corresponds to a 
generalization of the network partitioning principle 
advocated by F. Cohen to fight virus propagation [28]. 

Proposition 4: In a system context made up of 

services and resources, the non-infection property can 
only be guaranteed by a tight isolation of the resomces. 

Proof: Let us consider a system context made 
up of services and resources (see Section 3.1) of the 
form: Cgys = def Ds A Da in R\ [.] 
By hypothesis the context is stable and will only 
react to intrusions from the process P placed inside. 
To prove that isolation is required, we show that any 
writing access to a resource, either direct or indirect, 
must be forbidden. Let us begin by enumerating the 
possible intrusion cases from the process P: 

I. Intrusion towards a resource: 

J & Dn with J = xi{yl)\...\xn{y^)> R' 

def DsADii\{J}AJ in Ro\xi(zt)-Ri\--\xmiz^)-Rm\[-] 

x„.|,i(i:;;^)|...|x„(i;;) 

def DsADr in Ro\Ri\...\Rm\R'['y/^]\[-]- 

This can be simplified since in our model the Xi are 
only used to store the resource content meaning that 
Ri = foT I < i < m. From there, there are three 
sub-cases for this transition. 



1) Reading from the resource: 

R' = x\{yl)\...\xm,{yZ) {return yt,...,y^ to Xm+i. 
Once the return consumed, the system recover its 
initial state before the intrusion: the non-infection 
property is satisfied. 

2) Writing to the resource: 

R' = xi(ym+i)\--.\xm(yi)\return to Xm+i- Once the 
return consumed, the original values iji with l<i<m 
are sustituted by values yj with m + 1 < j <n. It is 
not thus guaranteed that the system will recover its 
original state before the intrusion: the non-infection 
property may not be satisfied. 

3) Executing the resource: 

This sub-case is equivalent to intrusion towards a 
service (see II.). 

II. Intrusion towards a service: 

J G Ds with J = xi{yt)\...\xn{y^) o S 

def Ds\{J}AJ ADRinR] [.] 

def Ds A Dr in S[-y/^] \R' \[.] 

S is of the form return f{zi, z^) to xi which re- 
duces to the null process when the return is consumed. 
The system modification thus depends on the nature of 
the fonction /. Once again, there are three sub-cases. 

1) Definition of f accessing no resource or only 
through a reading channel: This case is identical to the 
case I.l) and the non-infection property is satisfied. 

2) Definition of f using a writing or creation channel 
for resources: This case is identical to the case 1.2) 
and the non-infection property may not be satisfied. 

3) Definition of f accessing resources in execution: In 
this case, the solution depends on the content of the 
resource. The same test is applied recursively to this 
content until reaching the cases II. 1) or II.2). □ 

6.2. Policies to restrict infection scope 

The non-infection property is impossible to 
guarantee in practice. The complete isolation of 
resources can not obviously be considered in systems 
without loosing most of their use [28]. In fact, the 
hypothesis stated in 6.1 about legitimate programs 
is not always true in real cases. But if non-infection 
is impossible to deploy, approximate solutions can 
still contain the malware propagation by restricting 
spatially and temporally the resource accesses. Such 
a restriction does not completely prevent malware 
propagation but the scope of the propagation is at 
least be confined. 

Such a restriction can be deployed by an access 



authority, blocking any unauthorized access to the 
resources and services of a system. A solution 
based on access tokens can be considered, either 
for spatial restriction (only program and resources 
sharing the same token can access each other) or 
for time restriction in terms of counting executions 
(a given token can be used a fixed number of 
times). As defined in [32], an access authority is 
generically made up of two components: a PoUcy 
Decision Point (PDP) which can be seen as the token 
distribution mechanism and a Policy Enforcement 
Point (PEP) which checks the token validity and thus 
must not be bypassed (Definition 7). The obUgation 
to pass through a verification authority is similar 
to the transitive non-interference where high-level 
information can only transit to low-level channels 
through an intermediate [30] . This is reserved for 
future works. 

Definition 7: An access authority is constituted of: 
-A distribution process to deliver tokens denot ed D t- 
-A control mechanism providing interfaces chk to 
submit tokens for checking. 

The interfaces and the control mechanism are directly 
embedded in the system. The control is securely 
enforced (i.e. can not be bypassed) if the system 
without the distribution process satisfies the non- 
infection property. 

Example 1: Let T be a security token, non- 
forgeable i.e. if unknown, the token can not be rebuilt. 
T must thus not be exported by the system context: 
C^sys[]s\jR'OM, with T ^ S and T ^ R. Control can 
then be enforced at the resources and services level 
using the interface chk which compare the token in 
entry with the security token T: 

• def Ssv{t, org) > 

if chk{t, T) then return fsv{arg) else in ... 

• def R exec (/o)> 

def (write{t, f„ew)\content<f>) > 

if chk{t,T) then {return to write\content<fnew>) 

else content<f> 
A {read{t)\content<f>) > 

if chk{t, T) then {return f to read\content<f>) 

else content<f> 
A {exec{t, arg)\content<f>) > 

if chk{t, T) then {return f{arg) to exec\content<f>) 

else content<f> 
in content<fo>\return read, write, exec to Rexec in ... 

The example above is quite basic. It shows that 
if T is not forgeable and no distribution mechanism 



is responsible for its extrusion, the process placed in 
the context will not be able to access any service and 
resource. Mechanisms of access control definitely help 
to contain malware propagation. In fact, complete 
access control mechanims are already deployed in 
two well known security models for Java [33] and 
.Net [34]. In both models, the managed code is run in 
a isolated runtime environment (Java Virtual Machine 
or Conmion Language Runtime) with a controlled 
access to resources. A schematic view of the access 
control in the .Net framework is given in Figure 3. A 
parallel between the two models is given in the table 
below. These two access control models are already 
used to restrict malware propagation by restraining 
the number of services and resources available to 
untrusted codes. For example, the Same Origin 
Policy (SOP) forbid accesses to local resources, to 
any remote code running inside a web-browser. The 
problem in actual system is that these controls are 
restricted to managed language and not to native 
code. These works on malware prevention prove that 
extending control to native code would help to fight 
malware propagation. 



Model 


Java framework 


.NET framework 


Token distribution 
(process D) 


Secure class 
loader 


Policy resolution 
of the Common 
Language Runtime 
(CLR) 


Input for 
distribution 


Evidences 
(certificate, origin) 


Evidences 
(certificate, origin) 


Output 
(to]£en T) 


Permission domain 


Permission set 


Access control 
(interface chk) 


Security Manager 
calling the Access 
Controler using 
CheckpermissionO 


Code Access 
Security (CAS) 
enforced by 
the CLR 



7. Conclusion and perspectives 

This paper introduces the basis for a unified malware 
model based on process algebra and more particularly 
the Join-Calculus. Moving from the functional models 
currently used in abstract virology to process-based 
models do not result in a loss of expressiveness. The 
fundamental results are supported by the new model: 
characterization of the self-repUcation, undecidabiUty 
of the detection and isolation as perfect prevention. 

In addition, the new model offers a greater expres- 
siveness by the support of interactions, concurrency 
and non-termination which are commonly used in 
recent malware. In addition to computational aspects, 
these interactive notions ease the definition of complex 
behaviors such as stealth in rootkits. But modeling 
is not the only benefit; use of process algebra has 
provided new fundamental results in terms of detection 



and prevention. Even if the global problem of virus 
detection remains undecidable in this formalism, a 
fragment of the Join-Calculus where detection be- 
comes decidable has been identified. With regards to 
prevention, the property of non-infection has been pre- 
cisely defined as well as solutions to restrict malware 
propagation. 

In fact, just like no-interference, non-infection is 
a property which proves too strong for real cases. 
Approximate solutions based on security tokens have 
been evoked in the paper but future works can be 
led to reduce the strength of the property. Looking 
at existing works in process algebra, a promising 
perspective is to associate security levels to process 
through a typing mechanism. 

8. Works in progress 

8.1. Security levels and typing 

Theorem 2: (RESTRICTED NON-INFECTION). 
Let us consider a process considered with potentail 
risk r H-'"*"'^ P placed into a system context 
considered stable (i.e. potential reactions to intrusions 
only) and legitimate T ll-'^^^ CsysP\- The property of 
non-infection is satisfied by P if the system evolves 
along the reaction Csys[P] — Csj/s [-?"]. and for any 
non-infecting test process F Ih'^^ T the equivalence 
Csys[T] ~ C',y,XT] is true. The strength of the 
property is determined by the equivalence considered. 

Just Uke the original non-infection, restricted non- 
infection is only achieved if a complete isolation is 
made between legitimate and risky resources. This 
property is less strong and allows the modification 
of legitimate resources between them. Other typings 
may be defined as controls either for resources ac- 
cesses (parallel with behavioral blocking) or informa- 
tion flows (parallel with tainting techniques) to prevent 
self-repUcation. 

8.2. Stealth and observation 

Let (9 be a process monitoring one or several 
behaviors fi = fxi-.-Hn in a system S (IDS or 
behavioral AV). When an attack is detected, the 
process switches to a state signaUng the detection Oj: 

0\S^Oi\S'. 

Definition 8: (STEALTH) Let us define stealth 
relatively to an observer (and no longer to a system 
call [19]). The definition of an observer determines 
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Figure 3. .NET Security Model for access control. 



the observed behaviors which may bay coumpound 
of system calls. A malicious code M is stealthy with 
respect to an observer O if; 
0\S\M Oi\S'. 

A malicious code can be stealthy for any legitimate 
observer. However, malware necessarily modify 
legitimate programs, otherwise the non-infection 
property would be satisfied and it would not be 
a malware. An observer can thus be found to 
detect a malware. In other words absolute stealth for 
malicious code is impossible. This result is promising 
for behavioral observation. A parallel can be drawn 
with E. Filiol's result saying that it is not possible to 
introduce a stealthy malicious code without modifying 
significantly the distribution for an estimator [8]. 
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